Semantic Matching using Kernel Methods
نویسنده
چکیده
Semantic matching (SM) for textual information can be informally defined as the task of effectively modeling text matching using representations more complex than those based on simple and independent set of surface forms of words or stems (typically indicated as bag-of-words). In this perspective, matching named entities (NEs) implies that the associated model can both overcomes mismatch between different representations of the same entities, e.g., George H. W. Bush vs. George Bush, and carry out entity disambiguation to avoid incorrect matches between different but similar entities, e.g., the entity above with his son George W. Bush. This means that both the context and structure of NEs must be taken into account in the IR model. SM becomes even more complex when attempting to match the shared semantics between two larger pieces of text, e.g., phrases or clauses, as there is currently no theory indicating how words should be semantically composed for deriving the meaning of text. The complexity above has traditionally led to define IR models based on bag-of-word representations in the vector space model (VSM), where (i) the necessary structure is minimally taken into account by considering n-grams or phrases; and (ii) the matching coverage is increased by projecting text in latent semantic spaces or alternatively by applying query expansion. Such methods introduce a considerable amount of noise, which negatively balances the benefit of achieving better coverage in most cases, thus producing no IR system improvement. In the last decade, a new class of semantic matching approaches based on the so-called Kernel Methods (KMs) for structured data (see e.g., [4]) have been proposed. KMs also adopt scalar products (which, in this context, take the names of kernel functions) in VSM. However, KMs introduce two new important aspects: • the scalar product is implicitly computed using smart techniques, which enable the use of huge feature spaces, e.g., all possible skip n-grams; and • KMs are typically applied within supervised algorithms, e.g., SVMs, which, exploiting training data, can filter out irrelevant features and noise. In this talk, we will briefly introduce and summarize, the latest results on kernel methods for semantic matching by focusing on structural kernels. These can be applied to match syntactic and/or semantic representations of text shaped as trees. Several variants are available: the Syntactic Tree Kernels (STK), [2], the String Kernels (SK) [5] and the Partial Tree Kernels (PTK) [4]. Most interestingly, we will present tree kernels exploiting SM between words contained in a text structure, i.e., the Syntactic Semantic Tree Kernels (SSTK) [1] and the Smoothed Partial Tree Kernels (SPTK) [3]. These extend STK and PTK by allowing for soft matching (i.e., via similarity computation) between nodes associated with different but related labels, e.g., synonyms. The node similarity can be derived from manually annotated resources, e.g., WordNet or Wikipedia, as well as using corpus-based clustering approaches, e.g., latent semantic analysis (LSA). An example of the use of such kernels for question classification in the question answering domain will illustrate the potentials of their structural similarity approach.
منابع مشابه
A procedure for Web Service Selection Using WS-Policy Semantic Matching
In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...
متن کاملA Grammar-driven Convolution Tree Kernel for Semantic Role Classification
Convolution tree kernel has shown promising results in semantic role classification. However, it only carries out hard matching, which may lead to over-fitting and less accurate similarity measure. To remove the constraint, this paper proposes a grammardriven convolution tree kernel for semantic role classification by introducing more linguistic knowledge into the standard tree kernel. The prop...
متن کاملSEMILAR: A Semantic Similarity Toolkit for Assessing Students' Natural Language Inputs
We present in this demo SEMILAR, a SEMantic similarity toolkit. SEMILAR includes offers in one software environment several broad categories of semantic similarity methods: vectorial methods including Latent Semantic Analysis, probabilistic methods such as Latent Dirichlet Allocation, greedy lexical matching methods, optimal lexico-syntactic matching methods based on word-to-word similarities a...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملCentralized Clustering Method To Increase Accuracy In Ontology Matching Systems
Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...
متن کامل